1 Mathematics

Math is not just a way of calculating numerical answers; it is a way of thinking, using clear definitions for concepts and rigorous logic to organize our thoughts and back up our assertions.

Cheng (2025)

These lecture notes use:

  • algebra
  • precalculus
  • univariate calculus
  • linear algebra
  • vector calculus

Some key results are listed here.

1.1 Elementary Algebra

Equalities

Theorem 1 (Equalities are transitive) If \(a=b\) and \(b=c\), then \(a=c\)

Theorem 2 (Substituting equivalent expressions) If \(a = b\), then for any function \(f(x)\), \(f(a) = f(b)\)

Inequalities

Theorem 3 If \(a<b\), then \(a+c < b+c\)

Theorem 4 (negating both sides of an inequality) If \(a < b\), then: \(-a > -b\)

Theorem 5 If \(a < b\) and \(c \geq 0\), then \(ca < cb\).

Theorem 6 \[-a = (-1)*a\]

Sums

Theorem 7 (adding zero changes nothing) \[a+0=a\]

Theorem 8 (Sums are symmetric) \[a+b = b+a\]

Theorem 9 (Sums are associative)  

\[(a + b) + c = a + (b + c)\]

Products

Theorem 10 (Multiplying by 1 changes nothing) \[a \times 1 = a\]

Theorem 11 (Products are symmetric) \[a \times b = b \times a\]

Theorem 12 (Products are associative) \[(a \times b) \times c = a \times (b \times c)\]

Division

Theorem 13 (Division can be written as a product) \[\frac {a}{b} = a \times \frac{1}{b}\]

Sums and products together

Theorem 14 (Multiplication is distributive) \[a(b+c) = ab + ac\]

Quotients

Definition 1 (Quotients, fractions, rates)  

\[\frac{a}{b}\]

Definition 2 (Ratios) A ratio is a quotient in which the numerator and denominator are measured using the same unit scales.

Definition 3 (Proportion) In statistics, a “proportion” typically means a ratio where the numerator represents a subset of the denominator.

Definition 4 (Proportional) Two functions \(f(x)\) and \(g(x)\) are proportional if their ratio \(\frac{f(x)}{g(x)}\) does not depend on \(x\). (c.f. https://en.wikipedia.org/wiki/Proportionality_(mathematics))

Additional reference for elementary algebra: https://en.wikipedia.org/wiki/Population_proportion#Mathematical_definition

1.2 Exponentials and Logarithms

Theorem 15 (logarithm of a product is the sum of the logs of the factors) \[ \log{a\cdot b} = \log{a} + \log{b} \]

Corollary 1 (logarithm of a quotient)  

\[\log{\frac{a}{b}} = \log{a} - \log{b}\]

Theorem 16 (logarithm of an exponential function) \[ \text{log}{\left\{a^b\right\}} = b \cdot\text{log}{\left\{a\right\}} \]

Theorem 17 (exponential of a sum)  

\[\text{exp}{\left\{a+b\right\}} = \text{exp}{\left\{a\right\}} \cdot\text{exp}{\left\{b\right\}}\]

Corollary 2 (exponential of a difference)  

\[\text{exp}{\left\{a-b\right\}} = \frac{\text{exp}{\left\{a\right\}}}{\text{exp}{\left\{b\right\}}}\]

Theorem 18 (exponential of a product) \[a^{bc} = {\left(a^b\right)}^c = {\left(a^c\right)}^b\]

Corollary 3 (natural exponential of a product) \[\text{exp}{\left\{ab\right\}} = (\text{exp}{\left\{a\right\}})^b = (\text{exp}{\left\{b\right\}})^a\]

Exercise 1 For \(a \ge 0,~b,c \in \mathbb{R}\), When does \((a^b)^c = a^{(b^c)}\)?

Solution 1. Short answer: rarely (that’s all you need to know for this course).

Long answer:

If \((a^b)^c = a^{(b^c)}\), then since \((a^b)^c = a^{bc}\), we have: \[a^{bc} = a^{(b^c)}\] \[\text{log}{\left\{a^{bc}\right\}} = \text{log}{\left\{a^{(b^c)}\right\}}\] \[bc \cdot \text{log}{\left\{a\right\}} = b^c\cdot \text{log}{\left\{a\right\}} \tag{1}\]

Equation 1 holds in each of the following cases:

  1. \(bc = b^c\) (see Exercise 2).
  2. \(a=1\) (i.e., \(\text{log}{\left\{a\right\}} = 0\)).
  3. \(a=0\) (i.e., \(\text{log}{\left\{a\right\}}= -\infty\)) and \(\text{sign}{\left\{bc\right\}}=\text{sign}{\left\{b^c\right\}}\).

In particular, when \(a=0\) and \(c=0\), \(bc = 0\) and \(b^c = 1\) (for any \(b \in \mathbb{R}\)), so \(\text{sign}{\left\{bc\right\}}\neq \text{sign}{\left\{b^c\right\}}\), and \((a^b)^c \neq a^{(b^c)}\):

\[ \begin{aligned} (a^b)^c &= (0^b)^0 \\ &= 1 \end{aligned} \]

\[ \begin{aligned} a^{(b^c)} &= 0^{(b^0)} \\ &= 0^1 \\ &= 0 \end{aligned} \]

Exercise 2 For \(b,c \in \mathbb{R}\), when does \(b^c = bc\)?

Solution 2. \(bc = b^c\) in each of the following cases:

  1. \(c = 1\).
  2. \(b=0\) and \(c > 0\).
  3. \(b = \text{exp}{\left\{\frac{\log{c}}{c-1}\right\}}\) (for \(c \ge 0\)).

See the red contours in Figure 2 for a visualization.

[R code]
`b*c_f` <- function(b, c) b*c
`b^c_f` <- function(b, c) b^c
values_b <- seq(0, 5, by = .01)
values_c <- seq(-.5, 3, by = .01)

`b*c` <- outer(values_b, values_c, `b*c_f`)
`b^c` <- outer(values_b, values_c, `b^c_f`)
`b^c`[is.infinite(`b^c`)] = NA

opacity <- .3
z_min <- min(`b*c`, `b^c`, na.rm = TRUE)
z_max <- 5
plotly::plot_ly(
  x = ~values_b,
  y = ~values_c
) |>
  plotly::add_surface(
    z = ~ t(`b*c`),
    contours = list(
      z = list(
        show = TRUE,
        start = -1,
        end = 1,
        size = .1
      )
    ),
    name = "b*c",
    showscale = FALSE,
    opacity = opacity,
    colorscale = list(c(0, 1), c("green", "green"))
  ) |>
  plotly::add_surface(
    opacity = opacity,
    colorscale = list(c(0, 1), c("red", "red")),
    z = ~ t(`b^c`),
    contours = list(
      z = list(
        show = TRUE,
        start = z_min,
        end = z_max,
        size = .2
      )
    ),
    showscale = FALSE,
    name = "b^c"
  ) |>
  plotly::layout(
    scene = list(
      xaxis = list(
        # type = "log",
        title = "b"
      ),
      yaxis = list(
        # type = "log",
        title = "c"
      ),
      zaxis = list(
        # type = "log",
        range = c(z_min, z_max),
        title = "outcome"
      ),
      camera = list(eye = list(x = -1.25, y = -1.25, z = 0.5)),
      aspectratio = list(x = .9, y = .8, z = 0.7)
    )
  )
Figure 1: Graph of \(b*c\) and \(b^c\)
[R code]
`b^c - b*c_f` <- function(b, c) `b^c_f`(b,c) - `b*c_f`(b,c)

mat1 <- outer(values_b, values_c, `b^c - b*c_f`)
mat1[is.infinite(mat1)] = NA

opacity <- .3
plotly::plot_ly(
  x = ~values_b,
  y = ~values_c
) |>
  plotly::add_surface(
    z = ~ t(mat1),
    contours = list(
      z = list(
        show = TRUE,
        start = 0,
        end = 1,
        size = 1,
        color = "red"
      )
    ),
    name = "b^c - b*c",
    showscale = TRUE,
    opacity = opacity
  ) |>
  plotly::layout(
    scene = list(
      xaxis = list(
        # type = "log",
        title = "b"
      ),
      yaxis = list(
        # type = "log",
        title = "c"
      ),
      zaxis = list(
        title = "outcome"
      ),
      camera = list(eye = list(x = -1.25, y = -1.25, z = 0.5)),
      aspectratio = list(x = .9, y = .8, z = 0.7)
    )
  )
Figure 2: Graph of \(b^c - b*c\). Red contour lines show where \(b^c = b*c\).

Theorem 19 (\(\text{exp}{\left\{\right\}}\) and \(\text{log}{\left\{\right\}}\) are mutual inverses) \[\text{exp}{\left\{\text{log}{\left\{a\right\}}\right\}} = \text{log}{\left\{\text{exp}{\left\{a\right\}}\right\}} = a\]

1.3 Derivatives

Theorem 20 (Constant rule) \[\frac{\partial}{\partial x}c = 0\]

Theorem 21 (Power rule) If \(a\) is constant with respect to \(x\), then: \[\frac{\partial}{\partial x}ay = a \frac{\partial x}{\partial y}\]

Theorem 22 (Power rule) \[\frac{\partial}{\partial x}x^q = qx^{q-1}\]

Theorem 23 (Derivative of natural logarithm) \[\text{log}'{\left\{x\right\}} = \frac{1}{x} = x^{-1}\]

Theorem 24 (derivative of exponential) \[\text{exp}'{\left\{x\right\}} = \text{exp}{\left\{x\right\}}\]

Theorem 25 (Product rule) \[(ab)' = ab' + ba'\]

Theorem 26 (Quotient rule) \[(a/b)' = a'/b - (a/b^2)b'\]

Theorem 27 (Chain rule) \[\begin{aligned} \frac{\partial a}{\partial c} &= \frac{\partial a}{\partial b} \frac{\partial b}{\partial c} \\ &= \frac{\partial b}{\partial c} \frac{\partial a}{\partial b} \end{aligned} \]

or in Euler/Lagrange notation:

\[(f(g(x)))' = g'(x) f'(g(x))\]

Corollary 4 (Chain rule for logarithms) \[ \frac{\partial}{\partial x}\log{f(x)} = \frac{f'(x)}{f(x)} \]

Proof. Apply Theorem 27 and Theorem 23.

1.4 Linear Algebra

Definition 5 (Dot product/linear combination/inner product) For any two real-valued vectors \(\tilde{x}= (x_1, \ldots, x_n)\) and \(\tilde{y}= (y_1, \ldots, y_n)\), the dot-product, linear combination, or inner product of \(\tilde{x}\) and \(\tilde{y}\) is:

\[\tilde{x}\cdot \tilde{y}= \tilde{x}^{\top} \tilde{y}\stackrel{\text{def}}{=}\sum_{i=1}^nx_i y_i\]

Theorem 28 (Dot product is symmetric) The dot product is symmetric:

\[\tilde{x}\cdot \tilde{y}= \tilde{y}\cdot \tilde{x}\]

Proof. Apply:

1.5 Vector Calculus

(adapted from Fieller (2016), §7.2)

Let \(\tilde{x}\) and \(\tilde{\beta}\) be vectors of length \(p\), or in other words, matrices of length \(p \times 1\):

\[ \tilde{x}= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \\ \]

\[ \tilde{\beta}= \begin{bmatrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{bmatrix} \]

Definition 6 (Transpose) The transpose of a row vector is the column vector with the same sequence of entries:

\[ \tilde{x}' \equiv \tilde{x}^\top \equiv [x_1, x_2, ..., x_p] \]

Example 1 (Dot product as matrix multiplication) \[ \begin{aligned} \tilde{x}\cdot \tilde{\beta} &= \tilde{x}^{\top} \tilde{\beta} \\ &= [x_1, x_2, ..., x_p] \begin{bmatrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{bmatrix} \\ &= x_1\beta_1+x_2\beta_2 +...+x_p \beta_p \end{aligned} \]

Theorem 29 (Transpose of a sum) \[(\tilde{x}+\tilde{y})^{\top} = \tilde{x}^{\top} + \tilde{y}^{\top}\]

Definition 7 (Vector derivative) If \(f(\tilde{\beta})\) is a function that takes a vector \(\tilde{\beta}\) as input, such as \(f(\tilde{\beta}) = x'\tilde{\beta}\), then:

\[ \frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta}) = \begin{bmatrix} \frac{\partial}{\partial \beta_1}f(\tilde{\beta}) \\ \frac{\partial}{\partial \beta_2}f(\tilde{\beta}) \\ \vdots \\ \frac{\partial}{\partial \beta_p}f(\tilde{\beta}) \end{bmatrix} \]

Definition 8 (Row-vector derivative) If \(f(\tilde{\beta})\) is a function that takes a vector \(\tilde{\beta}\) as input, such as \(f(\tilde{\beta}) = x'\tilde{\beta}\), then:

\[ \frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta}) = \begin{bmatrix} \frac{\partial}{\partial \beta_1}f(\tilde{\beta}) & \frac{\partial}{\partial \beta_2}f(\tilde{\beta}) & \cdots & \frac{\partial}{\partial \beta_p}f(\tilde{\beta}) \end{bmatrix} \]

Theorem 30 (Row and column derivatives are transposes) \[\frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta}) = {\left(\frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta})\right)}^{\top}\]

\[\frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta}) = {\left(\frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta})\right)}^{\top}\]

Theorem 31 (Derivative of a dot product) \[ \frac{\partial}{\partial \tilde{\beta}} \tilde{x}\cdot \tilde{\beta}= \frac{\partial}{\partial \tilde{\beta}} \tilde{\beta}\cdot \tilde{x}= \tilde{x} \]

Proof. \[ \begin{aligned} \frac{\partial}{\partial \beta} (x^{\top}\beta) &= \begin{bmatrix} \frac{\partial}{\partial \beta_1}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \\ \frac{\partial}{\partial \beta_2}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \\ \vdots \\ \frac{\partial}{\partial \beta_p}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \end{bmatrix} \\ &= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \\ &= \tilde{x} \end{aligned} \]

Definition 9 (Quadratic form) A quadratic form is a mathematical expression with the structure

\[\tilde{x}^{\top} \mathbf{S} \tilde{x}\]

where \(\tilde{x}\) is a vector and \(\mathbf{S}\) is a matrix with compatible dimensions for vector-matrix multiplication.

Theorem 32 (Derivative of a quadratic form) If \(S\) is a \(p\times p\) matrix that is constant with respect to \(\beta\), then:

\[ \frac{\partial}{\partial \beta} \beta'S\beta = 2S\beta \]

Corollary 5 (Derivative of a simple quadratic form) \[ \frac{\partial}{\partial \tilde{\beta}} \tilde{\beta}'\tilde{\beta}= 2\tilde{\beta} \]

Theorem 33 (Vector chain rule) \[\frac{\partial z}{\partial \tilde{x}} = \frac{\partial y}{\partial \tilde{x}} \frac{\partial z}{\partial y}\]

or in Euler/Lagrange notation:

\[(f(g(\tilde{x})))' = \tilde{g}'(\tilde{x}) f(g(\tilde{x}))\]

Corollary 6 (Vector chain rule for quadratic forms) \[\frac{\partial}{\partial \tilde{\beta}}{{\left(\tilde{\varepsilon}(\tilde{\beta})\cdot \tilde{\varepsilon}(\tilde{\beta})\right)}} = {\left(\frac{\partial}{\partial \tilde{\beta}}\tilde{\varepsilon}(\tilde{\beta})\right)} {\left(2 \tilde{\varepsilon}(\tilde{\beta})\right)}\]

1.6 Additional resources

Calculus

Linear Algebra and Vector Calculus

  • Fieller (2016)
  • Banerjee and Roy (2014)
  • Searle and Khuri (2017)

Numerical Analysis

Real Analysis

Banerjee, Sudipto, and Anindya Roy. 2014. Linear Algebra and Matrix Analysis for Statistics. Vol. 181. Crc Press Boca Raton. https://www.routledge.com/Linear-Algebra-and-Matrix-Analysis-for-Statistics/Banerjee-Roy/p/book/9781420095388.
Banner, Adrian D. 2007. The Calculus Lifesaver : All the Tools You Need to Excel at Calculus. A Princeton Lifesaver Study Guide. Princeton, New Jersey: Princeton University Press. https://press.princeton.edu/books/paperback/9780691130880/the-calculus-lifesaver.
Cheng, Eugenia. 2025. “Opinion | How Math Turned Me from a D.E.I. Skeptic to a Supporter.” The New York Times. https://www.nytimes.com/2025/09/05/opinion/math-dei.html.
Dobson, Annette J, and Adrian G Barnett. 2018. An Introduction to Generalized Linear Models. 4th ed. CRC press. https://doi.org/10.1201/9781315182780.
Fieller, Nick. 2016. Basics of Matrix Algebra for Statistics with R. Chapman; Hall/CRC. https://doi.org/10.1201/9781315370200.
Grinberg, Raffi. 2017. The Real Analysis Lifesaver: All the Tools You Need to Understand Proofs. 1st ed. Princeton Lifesaver Study Guides. Princeton: Princeton University Press. https://press.princeton.edu/books/paperback/9780691172934/the-real-analysis-lifesaver.
Kaplan, Daniel. 2022. MOSAIC Calculus. www.mosaic-web.org. www.mosaic-web.org.
Khuri, André I. 2003. Advanced Calculus with Applications in Statistics. John Wiley & Sons. https://www.wiley.com/en-us/Advanced+Calculus+with+Applications+in+Statistics%2C+2nd+Edition-p-9780471391043.
Miller, Steven J. 2016. The Probability Lifesaver: Calculus Review Problems. https://web.williams.edu/Mathematics/sjmiller/public_html/probabilitylifesaver/index.htm#:~:text=http%3A//web.williams.edu/Mathematics/sjmiller/public_html/probabilitylifesaver/supplementalchap_calcreview.pdf.
Searle, Shayle R, and Andre I Khuri. 2017. Matrix Algebra Useful for Statistics. John Wiley & Sons.